© John Wiley & Sons, Inc.

FIGURE 19-5: Linear and exponential trends fitted to accident data.

Working with unequal observation intervals

In this fatal accident example, each of the 12 data points represents the accidents observed during a

one-year interval. But imagine analyzing the frequency of emergency department visits for patients

after being treated for emphysema, where there is one data point per patient. In that case, the width of

the observation interval may vary from one individual in the data to another. GLM lets you provide an

interval width along with the event count for each individual in the data. For arcane reasons, many

statistical programs refer to this interval-width variable as the offset.

Accommodating clustered events

The Poisson distribution applies when the observed events are all independent occurrences. But this

assumption isn’t met if events occur in clusters. Suppose you count individual highway fatalities

instead of fatal highway accidents. In that case, the Poisson distribution doesn’t apply, because one

fatal accident may kill several people. This is what is meant by clustered events.

The standard deviation (SD) of a Poisson distribution is equal to the square root of the mean

of the distribution. But if clustering is present, the SD of the data is larger than the square root of

the mean. This situation is called overdispersion. GLM in R can correct for overdispersion if you

designate the distribution family quasipoisson rather than poisson, like this:

glm(formula = Accidents ~ Year, family = quasipoisson(link = “log”))